Memory Hardware Support for Sparse Computations
نویسندگان
چکیده
Address computations and indirect, hence double, memory accesses in sparse matrix application software render sparse computations to be ine cient in general. In this paper we propose memory architectures that support the storage of sparse vectors and matrices. In a rst design, called vector storage, a matrix is handled as an array of sparse vectors, stored as singly-linked lists. Deletion and insertion of a vector is done rowor column-wise only. In a second design, called matrix storage, a higher level of sophistication is achieved. A sparse matrix is stored as a bi-directionally threaded doubly-linked list of elements. This approach enables both rowand column-wise operations. Reading a row (column) can be done at the speed of one element (real value and indices) per memory cycle, while extracting or updating takes 2 memory cycles. Inserting an element can be done once every 2.5 memory cycles. A pipelined variant with 3-fold interleaved memory and write bu ers yields higher e ciency, close to one sparse matrix element per memory cycle for all basic vector operations. In-memory operations also decrease the burden on processor, cache, and bus. 4
منابع مشابه
High-Performance Linear Algebra Processor using FPGA
With recent advances in FPGA (Field Programmable Gate Array) technology it is now feasible to use these devices to build special purpose processors for floating point intensive applications that arise in scientific computing. FPGA provides programmable hardware that can be used to design custom hardware without the high-cost of traditional hardware design. In this talk we discuss two multi-proc...
متن کاملEecient Support of Parallel Sparse Computation for Array Intrinsic Functions of Fortran 90 *
Fortran 90 provides a rich set of array intrinsic functions. Each of these array intrinsic functions operates on the elements of multi-dimensional array objects concurrently. They provide a rich source of parallelism and play an increasingly important role in automatic support of data parallel programming. However, there is no such support if these intrinsic functions are applied to sparse data...
متن کاملAutomatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures
Graphics processors are increasingly used in scientific applications due to their high computational power, which comes from hardware with multiple-level parallelism and memory hierarchy. Sparse matrix computations frequently arise in scientific applications, for example, when solving PDEs on unstructured grids. However, traditional sparse matrix algorithms are difficult to efficiently parallel...
متن کاملA MODIFIED STEFFENSEN'S METHOD WITH MEMORY FOR NONLINEAR EQUATIONS
In this note, we propose a modification of Steffensen's method with some free parameters. These parameters are then be used for further acceleration via the concept of with memorization. In this way, we derive a fast Steffensen-type method with memory for solving nonlinear equations. Numerical results are also given to support the underlying theory of the article.
متن کاملA GPU-Adapted Structure for Unstructured Grids
A key advantage of working with structured grids (e.g., images) is the ability to directly tap into the powerful machinery of linear algebra. This is not much so for unstructured grids where intermediate bookkeeping data structures stand in the way. On modern high performance computing hardware, the conventional wisdom behind these intermediate structures is further challenged by costly memory ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1994